NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety

Guan, Zihan; Hu, Mengxuan; Zhu, Ronghang; Li, Sheng; Vullikanti, Anil (July 2025, The Forty-Second International Conference on Machine Learning)

Recent studies have uncovered a troubling vulnerability in the fine-tuning stage of large language models (LLMs): even fine-tuning on entirely benign datasets can lead to a significant increase in the harmfulness of LLM outputs. Building on this finding, our red teaming study takes this threat one step further by developing a more effective attack. Specifically, we analyze and identify samples within benign datasets that contribute most to safety degradation, then fine-tune LLMs exclusively on these samples. We approach this problem from an outlier detection perspective and propose Self-Inf-N, to detect and extract outliers for fine-tuning. Our findings reveal that fine-tuning LLMs on 100 outlier samples selected by Self-Inf-N in the benign datasets severely compromises LLM safety alignment. Extensive experiments across seven mainstream LLMs demonstrate that our attack exhibits high transferability across different architectures and remains effective in practical scenarios. Alarmingly, our results indicate that most existing mitigation strategies fail to defend against this attack, underscoring the urgent need for more robust alignment safeguards.
more » « less
Free, publicly-accessible full text available July 14, 2026
UFID: A Unified Framework for Black-box Input-level Backdoor Detection on Diffusion Models

https://doi.org/10.1609/aaai.v39i26.34941

Guan, Zihan; Hu, Mengxuan; Li, Sheng; Vullikanti, Anil Kumar (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Diffusion models are vulnerable to backdoor attacks, where malicious attackers inject backdoors by poisoning certain training samples during the training stage. This poses a significant threat to real-world applications in the Model-as-a-Service (MaaS) scenario, where users query diffusion models through APIs or directly download them from the internet. To mitigate the threat of backdoor attacks under MaaS, black-box input-level backdoor detection has drawn recent interest, where defenders aim to build a firewall that filters out backdoor samples in the inference stage, with access only to input queries and the generated results from diffusion models. Despite some preliminary explorations on the traditional classification tasks, these methods cannot be directly applied to the generative tasks due to two major challenges: (1) more diverse failures and (2) a multi-modality attack surface. In this paper, we propose a black-box input-level backdoor detection framework on diffusion models, called UFID. Our defense is motivated by an insightful causal analysis: Backdoor attacks serve as the confounder, introducing a spurious path from input to target images, which remains consistent even when we perturb the input samples with Gaussian noise. We further validate the intuition with theoretical analysis. Extensive experiments across different datasets on both conditional and unconditional diffusion models show that our method achieves superb performance on detection effectiveness and run-time efficiency.
more » « less
Free, publicly-accessible full text available April 11, 2026
Sample Complexity of Linear Regression Models for Opinion Formation in Networks

https://doi.org/10.1609/aaai.v39i13.33531

Liu, Haolin; Rajaraman, Rajmohan; Sundaram, Ravi; Vullikanti, Anil Kumar; Wasim, Omer; Xu, Haifeng (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Consider public health officials aiming to spread awareness about a new vaccine in a community interconnected by a social network. How can they distribute information with minimal resources, so as to avoid polarization and ensure community-wide convergence of opinion? To tackle such challenges, we initiate the study of sample complexity of opinion formation in networks. Our framework is built on the recognized opinion formation game, where we regard each agent’s opinion as a data-derived model, unlike previous works that treat opinions as data-independent scalars. The opinion model for every agent is initially learned from its local samples and evolves game-theoretically as all agents communicate with neighbors and revise their models towards an equilibrium. Our focus is on the sample complexity needed to ensure that the opinions converge to an equilibrium such that every agent’s final model has low generalization error. Our paper has two main technical results. First, we present a novel polynomial time optimization framework to quantify the total sample complexity for arbitrary networks, when the underlying learning problem is (generalized) linear regression. Second, we leverage this optimization to study the network gain which measures the improvement of sample complexity when learning over a network compared to that in isolation. Towards this end, we derive network gain bounds for various network classes including cliques, star graphs, and random regular graphs. Additionally, our framework provides a method to study sample distribution within the network, suggesting that it is sufficient to allocate samples inversely to the degree. Empirical results on both synthetic and real-world networks strongly support our theoretical findings.
more » « less
Free, publicly-accessible full text available April 11, 2026
Stochastic Optimization and Learning for Two-Stage Supplier Problems

https://doi.org/10.1145/3604619

Brubach, Brian; Grammel, Nathaniel; Harris, David G; Srinivasan, Aravind; Tsepenekas, Leonidas; Vullikanti, Anil (March 2025, ACM Transactions on Probabilistic Machine Learning)

The main focus of this article is radius-based (supplier) clustering in the two-stage stochastic setting with recourse, where the inherent stochasticity of the model comes in the form of a budget constraint. In addition to the standard (homogeneous) setting where all clients must be within a distance\(R\)of the nearest facility, we provide results for the more general problem where the radius demands may beinhomogeneous(i.e., different for each client). We also explore a number of variants where additional constraints are imposed on the first-stage decisions, specifically matroid and multi-knapsack constraints, and provide results for these settings. We derive results for the most general distributional setting, where there is only black-box access to the underlying distribution. To accomplish this, we first develop algorithms for thepolynomial scenariossetting; we then employ a novelscenario-discardingvariant of the standardSample Average Approximationmethod, which crucially exploits properties of the restricted-case algorithms. We note that the scenario-discarding modification to the SAA method is necessary to optimize over the radius.
more » « less
Free, publicly-accessible full text available March 31, 2026
Identifying and forecasting importation and asymptomatic spreaders of multi-drug resistant organisms in hospital settings

https://doi.org/10.1038/s41746-025-01529-x

Cui, Jiaming; Heavey, Jack; Klein, Eili; Madden, Gregory_R; Sifri, Costi_D; Vullikanti, Anil; Prakash, B_Aditya (March 2025, npj Digital Medicine)
Mind Control through Causal Inference: Predicting Clean Images from Poisoned Data

Hu, Mengxuan; Guan, Zihan; Zeng, Yi; Guo, Junfeng; Zhou, Zhongliang; Zhang, Jielu; Jia, Ruoxi; Vullikanti, Anil; Li, Sheng (January 2025, International Conference on Learning Representations (ICLR))

Free, publicly-accessible full text available January 22, 2026
Mind Control through Causal Inference: Predicting Clean Images from Poisoned Data

Hu, Mengxuan; Guan, Zihan; Zeng, Yi; Guo, Junfeng; Zhou, Zhongliang; Zhang, Jielu; Jia, Ruoxi; Vullikanti, Anil Kumar; Li, Sheng (January 2025, International Conference on Learning Representations)

Anti-backdoor learning, aiming to train clean models directly from poisoned datasets, serves as an important defense method for backdoor attack. However, existing methods usually fail to recover backdoored samples to their original, correct labels and suffer from poor generalization to large pre-trained models due to its non end-to end training, making them unsuitable for protecting the increasingly prevalent large pre-trained models. To bridge the gap, we first revisit the anti-backdoor learning problem from a causal perspective. Our theoretical causal analysis reveals that incorporating both images and the associated attack indicators preserves the model's integrity. Building on the theoretical analysis, we introduce an end-to-end method, Mind Control through Causal Inference (MCCI), to train clean models directly from poisoned datasets. This approach leverages both the image and the attack indicator to train the model. Based on this training paradigm, the model’s perception of whether an input is clean or backdoored can be controlled. Typically, by introducing fake non-attack indicators, the model perceives all inputs as clean and makes correct predictions, even for poisoned samples. Extensive experiments demonstrate that our method achieves state-of-the-art performance, efficiently recovering the original correct predictions for poisoned samples and enhancing accuracy on clean samples.
more » « less
Free, publicly-accessible full text available January 22, 2026
Computing Epidemic Metrics with Edge Differential Privacy

Li, George; Nguyen, Dung; Vullikanti, Anil (May 2024, Proceedings of Machine Learning Research)

Full Text Available
Computing epidemic metrics with edge differential privacy

Li, George; Nguyen, Dung; Vullikanti, Anil (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics)

Full Text Available
Faster approximate subgraph counts with privacy

Nguyen, Dung; Halappanavar, Mahantesh; Srinivasan, Venkatesh; Vullikanti, Anil (May 2024, NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems)

One of the most common problems studied in the context of differential privacy for graph data is counting the number of non-induced embeddings of a subgraph in a given graph. These counts have very high global sensitivity. Therefore, adding noise based on powerful alternative techniques, such as smooth sensitivity and higher-order local sensitivity have been shown to give significantly better accuracy. However, all these alternatives to global sensitivity become computationally very expensive, and to date efficient polynomial time algorithms are known only for few selected subgraphs, such as triangles, k-triangles, and k-stars. In this paper, we show that good approximations to these sensitivity metrics can be still used to get private algorithms. Using this approach, we much faster algorithms for privately counting the number of triangles in real-world social networks, which can be easily parallelized. We also give a private polynomial time algorithm for counting any constant size subgraph using less noise than the global sensitivity; we show this can be improved significantly for counting paths in special classes of graphs.
more » « less
Full Text Available

« Prev Next »

Search for: All records